Methods and Tools for Prosodic Analysis of a Spoken Italian Corpus
نویسندگان
چکیده
In the last few years, a number of actions has been carried out in Italy with the goal of collecting, annotating and making available a considerable amount of data of spoken Italian varieties. After a first phase, in which the AVIP corpus has been collected and transcribed at both segmental and suprasegmental levels, now research efforts have been concentrating on corpus analysis, starting from two preliminary yet crucial aspect, namely: a) developing strategies and software tools for controlling the semantic coherence of the AVIP database; and b) designing a DBMS scheme for allowing easy access to the data and for rendering the results of the online queries in a user-friendly manner, also by means of special graphical interfaces. In this paper both aspects are presented and discussed, focussing on the prosodic analysis of the database, in terms of the methodologies followed in the intonation labelling phase as well as the consequent strategies adopted in the implemetation of software tools for prosodic analysis.
منابع مشابه
Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian
The automatic lemmatization and morpho-syntactic annotation of spoken language is a quite recent and complex task for Natural Language Processing. The state of the art on written corpora don’t provide us with a satisfactory level of analysis regarding spontaneous spoken language (Uchimoto et al., 2002; Moreno & Guirao, 2003). The spontaneous speech corpus Italian C-ORALROM has been tagged with ...
متن کاملThe Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کاملThe C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus
C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish.. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure at the same time a sufficient representation of spontaneous speech variation in each la...
متن کاملThe Vienna Prosodic Speech Corpus: Purpose, Content and Encoding
This paper presents a corpus of spoken German especially designed for the investigation of prosodic properties of speech. After a short discussion of the content and set-up of the corpus, we describe in detail the additional linguistic information, introduced into the corpus by labelling and annotation. In this project, both qualitative and quantitative methods have been used for the acquisitio...
متن کاملProsodic Parallelism—Comparing Spoken and Written Language
The Prosodic Parallelism hypothesis claims adjacent prosodic categories to prefer identical branching of internal adjacent constituents. According to Wiese and Speyer (2015), this preference implies feet contained in the same phonological phrase to display either binary or unary branching, but not different types of branching. The seemingly free schwa-zero alternations at the end of some words ...
متن کامل